Goto

Collaborating Authors

 word usage


Hidden 'fingerprints' found in the Bible after thousands of years rewrite the story of the Ark of the Covenant

Daily Mail - Science & tech

Scientists have uncovered hidden patterns in the Bible that challenge ancient beliefs about its origins. Using artificial intelligence, they discovered'fingerprints' in text throughout the Old Testament, suggesting multiple people wrote the stories. The traditional Jewish and Christian understanding is that Moses wrote the first five books of the Old Testament, including stories about creation, Noah's flood and the Ark of the Covenant. The new study found three distinct writing styles with distinct vocabulary, tone and focus areas, suggesting multiple authors and sources contributed to the books over time. Researchers used AI analyzed for 50 chapters across five books, uncovering inconsistencies in language and content, repeated stories, shifts in tone and internal contradictions.


DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages

arXiv.org Artificial Intelligence

Word meaning is notoriously difficult to capture, both synchronically and diachronically. In this paper, we describe the creation of the largest resource of graded contextualized, diachronic word meaning annotation in four different languages, based on 100,000 human semantic proximity judgments. We thoroughly describe the multi-round incremental annotation process, the choice for a clustering algorithm to group usages into senses, and possible - diachronic and synchronic - uses for this dataset.


Presence or Absence: Are Unknown Word Usages in Dictionaries?

arXiv.org Artificial Intelligence

There has been a surge of interest in computational modeling of semantic change. The foci of previous works are on detecting and interpreting word senses gained over time; however, it remains unclear whether the gained senses are covered by dictionaries. In this work, we aim to fill this research gap by comparing detected word senses with dictionary sense inventories in order to bridge between the communities of lexical semantic change detection and lexicography. We evaluate our system in the AXOLOTL-24 shared task for Finnish, Russian and German languages \cite{fedorova-etal-2024-axolotl}. Our system is fully unsupervised. It leverages a graph-based clustering approach to predict mappings between unknown word usages and dictionary entries for Subtask 1, and generates dictionary-like definitions for those novel word usages through the state-of-the-art Large Language Models such as GPT-4 and LLaMA-3 for Subtask 2. In Subtask 1, our system outperforms the baseline system by a large margin, and it offers interpretability for the mapping results by distinguishing between matched and unmatched (novel) word usages through our graph-based clustering approach. Our system ranks first in Finnish and German, and ranks second in Russian on the Subtask 2 test-phase leaderboard. These results show the potential of our system in managing dictionary entries, particularly for updating dictionaries to include novel sense entries. Our code and data are made publicly available\footnote{\url{https://github.com/xiaohemaikoo/axolotl24-ABDN-NLP}}.


Detection of Non-recorded Word Senses in English and Swedish

arXiv.org Artificial Intelligence

This study addresses the task of Unknown Sense Detection in English and Swedish. The primary objective of this task is to determine whether the meaning of a particular word usage is documented in a dictionary or not. For this purpose, sense entries are compared with word usages from modern and historical corpora using a pre-trained Word-in-Context embedder that allows us to model this task in a few-shot scenario. Additionally, we use human annotations to adapt and evaluate our models. Compared to a random sample from a corpus, our model is able to considerably increase the detected number of word usages with non-recorded senses.


Interpretable Word Sense Representations via Definition Generation: The Case of Semantic Change Analysis

arXiv.org Artificial Intelligence

We propose using automatically generated natural language definitions of contextualised word usages as interpretable word and word sense representations. Given a collection of usage examples for a target word, and the corresponding data-driven usage clusters (i.e., word senses), a definition is generated for each usage with a specialised Flan-T5 language model, and the most prototypical definition in a usage cluster is chosen as the sense label. We demonstrate how the resulting sense labels can make existing approaches to semantic change analysis more interpretable, and how they can allow users -- historical linguists, lexicographers, or social scientists -- to explore and intuitively explain diachronic trajectories of word meaning. Semantic change analysis is only one of many possible applications of the `definitions as representations' paradigm. Beyond being human-readable, contextualised definitions also outperform token or usage sentence embeddings in word-in-context semantic similarity judgements, making them a new promising type of lexical representation for NLP.


Smart Word Suggestions for Writing Assistance

arXiv.org Artificial Intelligence

Enhancing word usage is a desired feature for writing assistance. To further advance research in this area, this paper introduces "Smart Word Suggestions" (SWS) task and benchmark. Unlike other works, SWS emphasizes end-to-end evaluation and presents a more realistic writing assistance scenario. This task involves identifying words or phrases that require improvement and providing substitution suggestions. The benchmark includes human-labeled data for testing, a large distantly supervised dataset for training, and the framework for evaluation. The test data includes 1,000 sentences written by English learners, accompanied by over 16,000 substitution suggestions annotated by 10 native speakers. The training dataset comprises over 3.7 million sentences and 12.7 million suggestions generated through rules. Our experiments with seven baselines demonstrate that SWS is a challenging task. Based on experimental analysis, we suggest potential directions for future research on SWS. The dataset and related codes is available at https://github.com/microsoft/SmartWordSuggestions.


Computer-Aided Modelling of the Bilingual Word Indices to the Ninth-Century Uchitel'noe evangelie

arXiv.org Artificial Intelligence

The development of bilingual dictionaries to medieval translations presents diverse difficulties. These result from two types of philological circumstances: a) the asymmetry between the source language and the target language; and b) the varying available sources of both the original and translated texts. In particular, the full critical edition of Tihova of Constantine of Preslav's Uchitel'noe evangelie ('Didactic Gospel') gives a relatively good idea of the Old Church Slavonic translation but not of its Greek source text. This is due to the fact that Cramer's edition of the catenae - used as the parallel text in it - is based on several codices whose text does not fully coincide with the Slavonic. This leads to the addition of the newly-discovered parallels from Byzantine manuscripts and John Chrysostom's homilies. Our approach to these issues is a step-wise process with two main goals: a) to facilitate the philological annotation of input data and b) to consider the manifestations of the mentioned challenges, first, separately in order to simplify their resolution, and, then, in their combination. We demonstrate how we model various types of asymmetric translation correlates and the variability resulting from the pluralism of sources. We also demonstrate how all these constructions are being modelled and processed into the final indices. Our approach is designed with generalisation in mind and is intended to be applicable also for other translations from Greek into Old Church Slavonic.


How to Improve Corporate Culture with Artificial Intelligence

#artificialintelligence

Contrary to press-propagated blames on rapid industry changes, unforeseen circumstances and uncontrollable crises, most business failures boil down to poor corporate culture. Interestingly, how corporate culture is perceived has changed just as rapidly as industries have evolved in recent times. In the 20th and early 21st centuries, assessment of corporate culture focused almost entirely on how businesses treated their customers. For instance, the dent in Blackberry's culture was caused by the company prioritizing its smartphone technology over customers' needs. Meanwhile, how customers interact with technology was changing. More recently, corporate culture has more to do with how companies manage communication internally than with their public relations.


Alzheimer's Prediction May Be Found in Writing Tests

#artificialintelligence

The researchers examined the subjects' word usage with an artificial intelligence program that looked for subtle differences in language. It identified one group of subjects who were more repetitive in their word usage at that earlier time when all of them were cognitively normal. These subjects also made errors, such as spelling words wrongly or inappropriately capitalizing them, and they used telegraphic language, meaning language that has a simple grammatical structure and is missing subjects and words like "the," "is" and "are." The members of that group turned out to be the people who developed Alzheimer's disease. The A.I. program predicted, with 75 percent accuracy, who would get Alzheimer's disease, according to results published recently in The Lancet journal EClinicalMedicine.


How law enforcement agencies use artificial intelligence to fight crime 7wData

#artificialintelligence

Artificial intelligence (AI) has been on everyone's lips lately, and for good reason. The technology is constantly finding new applications and has already transformed a number of industries, including healthcare, communications, automotive, and financial, with others set to follow in the near future. Given the stakes involved, it may not be particularly surprising that law enforcement has somewhat lagged behind other sectors when it comes to the adoption of artificial intelligence. However, that's slowly starting to change, with law enforcement agencies around the world increasingly turning to AI to help them fight crime. A recent report published by MarketsandMarkets estimates that the global law enforcement software market will grow from $10 billion in 2017 to $18 billion by 2023.